Plasmodium falciparum Genetic Diversity Maintained and 
Amplified Over 5 Years of a Low Transmission Endemic in the 
Peruvian Amazon 



OraLee H. Branch,*!' 1 Patrick L Sutton,!' 1 Carmen Barnes, 2 Juan Carlos Castro, 3 Julie Hussin, 4 
Philip Awadalla, 4 and Gisely Hijar 2 

department of Medical Parasitology, New York University 

2 l_aboratorio de Biotecnologia y Biologia Molecular, Instituto Nacional de Salud, Lima, Peru 

3 l_aboratorio Investigaciones Productos Naturales Antiparasitarios de la Amazonia, Universidad Nacional Amazonia Peruana, 
Iquitos, Peru 

4 Sainte-Justine Research Centre, University of Montreal, Montreal, Quebec, Canada 
tThese authors contributed equally to this work. 
'Corresponding author: E-mail: oralee.branch@nyumc.org. 
Associate editor: Daniel Falush 

Abstract 

Plasmodium falciparum entered into the Peruvian Amazon in 1994, sparking an epidemic between 1995 and 1998. Since 2000, 
there has been sustained low P. falciparum transmission. The Malaria Immunology and Genetics in the Amazon project has 
longitudinally followed members of the community of Zungarococha (N = 1,945, 4 villages) with active household and health 
center-based visits each year since 2003. We examined parasite population structure and traced the parasite genetic diversity 
temporally and spatially. We genotyped infections over 5 years (2003-2007) using 14 microsatellite (MS) markers scattered 
across ten different chromosomes. Despite low transmission, there was considerable genetic diversity, which we compared 
with other geographic regions. We detected 182 different haplotypes from 302 parasites in 217 infections. Structure v2.2 
identified live clusters (subpopulations) of phylogenetically related clones. To consider genetic diversity on a more detailed 
level, we defined haplotype families (hapfams) by grouping haplotypes with three or less loci differences. We identified 34 
different hapfams identified. The F sc statistic and heterozygosity analysis showed the five clusters were maintained in each 
village throughout this time. A minimum spanning network (MSN), stratified by the year of detection, showed that 
haplotypes within hapfams had allele differences and haplotypes within a cluster definition were more separated in the later 
years (2006-2007). We modeled hapfam detection and loss, accounting for sample size and stochastic fluctuations in 
frequencies overtime. Principle component analysis of genetic variation revealed patterns of genetic structure with time 
rather than village. The population structure, genetic diversity, appearance/disappearance of the different haplotypes from 
2003 to 2007 provides a genome-wide "real-time" perspective of P. falciparum parasites in a low transmission region. 
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Background 

Plasmodium falciparum malaria causes more than 1 million 
deaths annually in Sub-Saharan Africa alone. Although ma- 
laria is being reduced in some historically high-malaria 
transmission regions, it is spreading into new geographic 
regions. Malaria epidemics begin when infected mosqui- 
toes enter into a susceptible population of humans. Recur- 
rent transmission of malaria parasites by mosquitoes 
within a human population may establish a long-term, per- 
sistent, endemic transmission cycle. During endemic trans- 
mission, parasites that have been successfully transmitted 
over time will inevitably undergo some genetic changes by 
random point mutation, replication slippage, or recombi- 
nation. The redetection or loss of these parasites defines 
the parasite population structure. 



In regions of historically high transmission, the popu- 
lation structure is obscured by frequent overlapping infec- 
tions. In contrast, the discrete, nonoverlapping infections 
in low and recent malaria transmission areas promise a less 
ambiguous characterization of the malaria parasite pop- 
ulation structure. For that reason, we began our study in 
the Peruvian Amazon Jungle. Plasmodium falciparum was 
first detected in perimeter regions of the jungle city, 
Iquitos, in 1994, with an epidemic occurring between 
1995 and 1998. This epidemic was curbed, likely by 
effective intervention efforts of fumigation and free, 
highly controlled drug treatment. Since 2000, there has 
been sustained low P. falciparum transmission (Roberts 
et a I. 1997; Aramburu Guarda et al. 1999; Roshanravan 
et al. 2003; Branch et al. 2005). 
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The Malaria Immunology and Genetics in the Amazon 
project, established in 2003, follows parasite genetic diver- 
sity and human host immune responses over time. In 
2003-2004, combined active and passive case detection de- 
scribed a transmission rate of less than 0.5 P. falciparum 
infections/person/year in Zungarococha, a community of 
approximately 1,900 individuals located south of Iquitos 
in one of the highest P. falciparum transmission regions 
(Branch et al. 2005). Our previous work showed that al- 
though the frequency of having more than one parasite 
clone (genotype) per infection was low, there is significant 
population-level diversity (PLD) in P. falciparum parasites 
circulating in Zungarococha (Branch et al. 2005; Chenet 
et al. 2008; Sutton et al. 2009). 

The current study characterizes P. falciparum parasite 
population structure, temporally and spatially, using 14 mi- 
crosatellite (MS) markers scattered across ten chromosomes. 
In general, previous MS studies have demonstrated that 
the number of alleles and diversity of alleles (heterozygos- 
ity) across various MS markers decrease with decreasing P. 
falciparum transmission (Anderson et al. 2000; Leclerc et al. 
2002; Hoffmann et al. 2003; Machado et al. 2004; Bogreau 
et al. 2006; Dalla Martha et al. 2007; Zhong et al. 2007; 
Bonizzoni et al. 2009). Such decreases in genetic diversity 
are generally is attributable to both low endemicity and 
fewer opportunities for sexual recombination between geneti- 
cally distinct parasites circulating in low transmission, result- 
ing in linkage of markers across chromosomes (Conway et al. 
1997; Anderson et al. 2000; Durand et al. 2003). Genetic link- 
age in low transmission settings can be particularly valuable 
for tracking parasites in a population temporally and spatially. 
Only one prior P. falciparum MS study considered diversity 
over time and reported that parasites defined by MS haplo- 
types appeared/disappeared in a way consistent with random 
neutral events, that is, immigration of new clones and loss of 
clones due to genetic drift (Orjuela-Sanchez et al. 2009). 

This study had two main objectives: 1) Determine the 
overall parasite population structure and subgrouping of 
similar genotypes temporally and spatially and 2) Determine 
how allele frequencies change with time in this low-endemic 
region. Each of the alleles had between 3 and 8 polymorphic 
forms. Parasites were defined by concatenating alleles de- 
tected in 14 MS loci into individual haplotypes. Considering 
infections detected within the four networked villages of 
Zungarococha over 5 years, we found 182 different haplo- 
types from 302 parasites in 217 discrete infections. Although 
this is a hypoendemic region, the maintenance of parasite 
diversity over time is consistent with a high-effective pop- 
ulation size of P. falciparum in this low-endemic region. Trac- 
ing genetic diversity over time and space might help us to 
understand how P. falciparum has persisted in this human 
population from 2003 to present. 

Materials and Methods 

Study Design 

This longitudinal study employed both active and passive 
case detection surveillance methods of 1,945 individuals 



from Zungarococha, in the San Juan District south of 
Iquitos, Peru from 2003 to 2007. The San Juan district is 
an epicenter of P. falciparum and P. vivax transmission, 
with infections typically occurring during the rainy season 
from January to July. The community of Zungarococha is 
a network of four villages: Zungarococha town (ZG), Puerto 
Almendra (PA), Ninarumi (NR), and Llanchama (LL). There 
is a distance of approximately 1 km between ZG, PA, and 
NR (well within Anopheles darlingi immigration range) and 
approximately 5 km between ZG and LL (although possible, 
it is not likely to have mosquito migration between ZG and 
LL). Zungarococha was selected based upon existing re- 
ports of locally acquired P. falciparum malaria infections, 
a high P. falciparum incidence rate compared with commu- 
nities in and surrounding Iquitos, the acceptance by the 
community, and the fact that the community is composed 
of four villages each serviced by the same health center. 

Active case detection (ACD) consisted of one blood 
sample collected per consenting individual in the begin- 
ning and ending of the malaria season (January to July). 
ACD also included weekly blood sampling during at least 
1 of the 7 transmission months. Sample collection in ACD is 
orchestrated by a physician who obtains fingerprick blood 
samples or BD Vacutainer (Becton, Dickinson and Co., 
Franklin Lakes, NJ) blood samples along with a comprehen- 
sive epidemiology/demographic and clinical questionnaire. 

In addition to ACD, all individuals who presented with 
a fever or reported fever were tested for malaria by micros- 
copy upon visiting the community health center staffed by 
our team and the Peruvian Ministry of Health. Details of ACD 
(asymptomatic and symptomatic) and passive case detection 
(symptomatic) are described in Branch et al. (2005). 

Whether asymptomatic or symptomatic, treatments are 
given at no cost to the patient. All treatments were given 
through the MINSA authorities, following the MINSA Na- 
tional Drug Policy Guidelines. Plasmodium vivax treatment 
is chloroquine (10 mg/kg for 3 days) with primaquine (0.5 
mg/kg for 7 days). Plasmodium falciparum treatment is me- 
floquine (12.5 mg/kg daily for 2 days) with artesunate (4 
mg/kg daily for 3 days) in nonpregnant patients older than 
1 year of age. These are given as observed drug treatment 
therapy; all malaria cases are diagnosed and reported so 
that there is limited access to these drugs outside of the 
MINSA system. 

All blood samples, positive or negative by microscopy, 
underwent DNA extraction by Qiagen DNeasy Blood 
and Tissue Kits (Qiagen Inc., Valencia, CA) and were tested 
for presence of Plasmodium species (Rubio et al. 1999) us- 
ing a seminested multiplex polymerase chain reaction 
(PCR) method targeting DNA encoding the ssrDNA 
(Branch et al. 2005; Sutton et al. 2009). 

DNA samples were selected by including an average of 
28.9% (minimum = 16.6, maximum = 46.0; standard de- 
viation [SD] = 11.1) of the malaria infections detected in 
each year. Malaria infections were considered on a monthly 
basis, such that if there were PCR positive samples detected 
within 1 month of sampling, this was considered one 
infection (infection-month). Any P. falciparum clones 
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detected within the month were concatenated, possibly re- 
sulting in some infections having more than one clone de- 
tected throughout the infection (a multiclone, "mixed 
infection") (Sutton et aL 2009). As shown in Sutton 
et al. (2009), these mixed infections were likely due to 
the simultaneous inoculation of more than one clone dur- 
ing a single mosquito biting event, due to the extreme low 
transmission (<0.5 P. falciparum infections/person/year). 

Human Subjects Ethical Approval 
All protocols were reviewed and approved by the Institu- 
tional Review Boards at New York University, the Peruvian 
Ministry of Health Institutes of National Health, and the 
University of Alabama at Birmingham (former affiliation). 
We obtained written informed consent from all partici- 
pants in this study. In the case of minors less than 7 years 
old, the parents or guardians gave consent. In the case of 
minors between 7 and 18, both assent from the minor and 
consent from the parents or guardians were obtained prior 
to enrollment. 

Microsatellite Marker Selection and Amplification 
Fourteen MS markers were chosen from the MS linkage map 
for amplification. Six of these MS markers (ARA2, PfPK2, 
POLY A, TA42, TA1, and TA109) were chosen because they 
were used in previous studies in South America, including 
Bolivia, Brazil, the Brazilian Amazon, Columbia, and Peru 
(Anderson et al. 2000; Hoffmann et al. 2003; Machado 
et al. 2004; Dalla Martha et al. 2007). The same six markers 
were also used in regions of higher transmission, like the 
Democratic Republic of Congo, Papua New Guinea, Uganda, 
Western Kenya, and Zimbabwe; this enabled comparison of 
genetic diversity between low- and high-transmission re- 
gions (Anderson et al. 2000; Leclerc et al. 2002; Bogreau 
et al. 2006; Zhong et al. 2007). We used eight new MS 
markers (B5M5, BM17, C1M4, C1M67, C4M69, C9M11, 
C13M13, and Pf2802) because they are known to be highly 
polymorphic markers (suggestion courtesy of Dr Xin-zhuan 
Su). These eight markers were originally tested in the Na- 
tional Institutes of Health (NIH), Department of Infectious 
Disease laboratory with a test set of 20 DNA samples from 
our study population. These eight MS markers (B5M5, 
BM17, C1M4, and C13M13) have been used in recent anti- 
malaria drug studies (Vieira et al. 2004; Liu et al. 2008). 

PCR protocols were developed by Dr Su's team at the 
NIH Infectious Disease laboratory. PCR master mixes were 
comprised of fluorescently labeled primers at 5 ,uM (Inte- 
grated DNA Technologies, Inc, Coralville, IA), 1 mM con- 
centration of deoxy nucleoside triphosphate mixture 
(Invitrogen, Carlsbad, CA), MgCl 2 at a concentration of 
2 mM, 10x PCR buffer, Platinum Taq (Invitrogen), molec- 
ular grade water, and extracted DNA (genomic DNA ad- 
justed to 10-20 ng/jd) or whole-genome amplified DNA. 
All PCRs were performed in an Eppendorf Mastercycler 
ep (Westbury, NY). The PCR program was as follows: de- 
nature at 94 °C for 2 min, 1 cycle; denature at 94 °C for 20 s, 
anneal at 52 °C for 10 s, anneal at 47 °C for 10 s, extension 



time of 30 s at 60 °C, 42 cycles; extension time of 5 min at 
60 °C, 1 cycle. 

PCR product repeat-length sizes were determined on an 
ABI 3130x1 genetic analyzer (Applied Biosystems, Foster, 
CA). The use of three different fluorescently labeled pri- 
mers enabled three MS markers to be measured at the 
same time in a multiplexed assay. Each MS marker allele 
length was determined by using internal size standards 
(CeneScan 500 LIZ Size Standard, Applied Biosystems) with 
the GeneMapper v4.0 software. Only alleles detected with 
a peak height >200 fluorescent units were considered. 

During every microsatellite reaction, a Dd2 reference iso- 
late was included as a positive control, P. vivax DNA, hu- 
man DNA, and water were utilized as negative controls. 
Negative controls were used to ensure against PCR con- 
tamination and to rule out spurious bands that may be 
generated by nonspecific hybridization from human 
DNA. Moreover, the system was tested for P. falciparum 
specificity by attempting amplification of each primer with 
various P. vivax and P. falciparum samples. 

Data Analysis 

Assembling the Haplotypes 

The specificity of having multiple single copy loci (14 MS 
markers) results in a fine characterization of parasite clone 
frequencies in the population, defined by a MS marker hap- 
lotype. Methods from Anderson et al. (2000) were used to 
interpret individual clonal haplotypes from infections iden- 
tified as having more than one clone present (complex). 
Complex infections from microsatellite haplotype profiles 
can be identified by detection of two or more alleles at a sin- 
gle locus. However, it is possible that nonspecific marker 
binding will cause noise in the form of minor peak heights 
during the capillary electrophoresis reaction. Therefore, the 
standard method is to exclude minor peaks less than 1/3 
the height of the major peaks (alleles) (Anderson et al. 
2000). If only one locus had multiple peaks, then separating 
the clones of complex infections was unambiguous. In this 
method, single-clone haplotype profiles were assembled 
(phased). In the few instances where there was more than 
one locus with multiple alleles (35 samples from 217 infec- 
tions), major peaks were tentatively paired with major 
peaks and minor peaks were tentatively paired with minor 
peaks due to the peak height approximately representing 
the density of major and minor clones in an infection. Final 
haplotype profiles determination for the 35 complex infec- 
tions was made using an iteration analysis that determined 
agreement with the most parsimonious haplotype profile 
of each constituent of all complex infections. 

Population Genetic Diversity and Structure 
Expected heterozygosity (H e ) was used to quantify the 
amount of genetic diversity considering the haplotypes. 
Additionally, H e was calculated for each locus and then 
compared with the allele count observed at each locus. 
When H e is greater than expected based upon the observed 
number of alleles detected at each locus, it suggests that 
the population may have recently undergone a bottleneck 



1975 



Branch et al. • doi:10.1093/molbev/msq311 



MBE 



(Cornuet and Luikart 1996; Garza and Williamson 2001; 
Schultz et al. 2009). Using Bottleneck v1.2.02, we tested 
if this phenomenon was occurring in this Peruvian Amazon 
study, considering 1,000 simulations for mutations models 
of infinite alleles (IAM) and stepwise mutation (SMM). Ul- 
timately, we use SMM as it has been identified as a more 
stringent model when using microsatellite data (Luikart 
and Cornuet 1998; Iwagami et al. 2009). 

For population structure analysis, we used the unsuper- 
vised Bayesian clustering algorithm implemented in Struc- 
ture 2.2 to group individual MS haplotype profiles into 
genetically related clusters (Pritchard et al. 2000). Each hap- 
lotype was given an estimated membership coefficient (Q) 
that translates to the fraction of relatedness (or ancestry) 
within/between each cluster (Rosenberg et al. 2005). The 
number of related clusters (K) assumed in this program 
must be predicted, we predicted there would be between 
1 and 10 clusters within this study population. For each 
value of K, the clustering algorithm was performed 5 times 
for 10,000 Monte Carlo Markov Chain iterations, preceded 
by a burn-in period of 10,000 iterations. Population differ- 
entiation of each cluster was confirmed using Arlequin v3.5 
(Excoffier and Slatkin 1995; Excoffier and Lischer 2010). In- 
ferred ancestry values from Structure v2.2 were exported 
into Microsoft Excel and used to create individual bar plots 
for each sampling epoch in this longitudinal study. 

Linkage disequilibrium (LD), a measurement of the non- 
random association of alleles at different loci, was calcu- 
lated across all 14 loci comprising the MS haplotypes for 
each infection grouped according to hapfam and also by 
year of appearance (Arlequin 3.5, Excoffier and Lischer 
2010). Fisher's exact tests were used to determine statistical 
significance of LD and average percentage of linked loci per 
locus. 

Haplotype Family Analysis 

Each parasite clone was compared with each of the other 
clones (pairwise comparison). We grouped infections into 
haplotypes families based on the number of common loci. 
A haplotype family (hapfam) was defined by any haplotype 
profiles that were within three loci deviations of other hap- 
lotypes (79.0% concordance). Population differentiation of 
hapfams was tested using Arlequin v3.5 (Excoffier and Slat- 
kin 1995; Excoffier and Lischer 2010). We tested the average 
number of pairwise differences between hapfams and within 
hapfams, with Nei's distance as an additional measurement 
for genetic distance between populations (Nei and Li 1979; 
Reynolds etal. 1983; Peterson etal. 1995; Slatkin 1995). These 
data are reported in supplementary figure 1 (Supplementary 
Material online). 

We developed an estimator, n s , to compute redetection 
of hapfams within a fixed time frame (2003-2007). The es- 
timator is based on the idea that a hapfam with more in- 
dividuals at the start of the sampling period will have 
a higher probability of redetection at the end of the study 
period. When a sampled hapfam is not observed within 
a given year (n = 0), the probability that we sample this 
hapfam the following year is low. In contrast, when five or 



more individuals are observed in a given year (n > 5), the 
probability is high. Let P(R) be the probability of redetec- 
tion from year yl to year y2 given that the redetection de- 
pends on the number of individual haplotypes observed in 
y1 (nyl). P(R) is defined as: 

P(R) = P(R|n y1 = 0)P(n = 0) 

+ P(R|0<n y1 <5)P(0<n y1 <5) + P(R|n y1 > 5)P(n>5). 

(1) 

P (n = 0) is the proportion of families that were not 
observed in a given year. Similarly, P (0 < n y) < 5) and 
P (iyi > 5) are the proportion of families that were 
observed with 0 < n < 5 and n > 5, respectively. 

The conditional probabilities, P(R|n y2 = 0), P(R|0<n y2 <5), 
and P(R|n y1 > 5), are the probability of redetecting a family at 
yl when n y1 = 0, when 0 < n y , < 5, and when n y1 > 5, re- 
spectively. Let yf be any year from yl to 2007 and let E be one of 
the following events: n yl = 0, 0 < n y7 < 5, or n yl > 5. Then: 

P(R|£) = PK/>0|£) = ' g p (£) g • (2) 

The quantities P(E\n yf >0), P(n y/ >0), and P(E) can be 
estimated from the data. If N different families are present 
in the sample, the estimate of the number redetected be- 
tween 2003 and 2007 is n s = P(R) x N. 

We estimated the expected number of redetected fam- 
ilies n s overall (for the entire sample), and we used a chi- 
square test to compare it with the observed estimate of the 
number of redetected families computed separately for 
each category (monomorphic and polymorphic). The null 
hypothesis is that the number of redetected families in 
a category is the same as the expected number given all 
the data. 

We assessed whether there is significant variation in 
haplotype diversity among communities and years by anal- 
ysis of variance (ANOVA). Patterns of genetic diversity 
were inferred by principal component analysis (PCA). 
The number of repeats is standardized to have a mean of 
zero and variance of 1, and the PCA is conducted on the 
matrix of standardized number of repeats using the svdPca 
method from the R package "pcaMethods." We tested for 
significant differentiation of the regional (ZG, PA, NR, LL) 
and temporal (2003-2005, 2006-2007) samples on the first 
three PCs through ANOVA using the R statistical package. 

Data and Results 

Genetic Diversity Characterized by Haplotype 
Profiles 

In total, this study included 217 different P. falciparum 
infections occurring between 2003 and 2007 in the four 
villages comprising the community of Zungarococha. 
The 217 infections were randomly selected to obtain a sam- 
pling density of approximately 25% of all infections detected 
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Table 1. All Infections Detected in Study by Year, by Village, and Number of Plasmodium falciparum Clones Identified. 



Village 


Infection Breakdown 


2003, (N = 34) 


2004, (N = 63) 


2005, (N = 114) 


2006, (N = 70) 


2007, (N = 21) 


PA (N = 89) 


Number of infections identified 


25 


56 


70 


18 


17 




Number of infections tested 


10 


9 


25 


12 


5 




Number of clones 


18 


10 


39 


15 


7 


NR (N — 99) 


Number of infections identified 


28 


94 


118 


38 


8 




Number of infections tested 


6 


13 


26 


19 


5 




Number of clones 


12 


22 


34 


25 


6 


LL (N = 67) 


Number of infections identified 


2 


31 


102 


50 


4 




Number of infections tested 


0 


13 


18 


20 


1 




Number of clones 


0 


15 


26 


22 


4 


ZC (N = 47) 


Number of infections identified 


11 


66 


68 


18 


15 




Number of infections tested 


2 


11 


13 


6 


3 




Number of clones 


4 


16 


15 


8 


4 



in each year and village (table 1). Using our 14 MS marker- 
defined haplotypes to define clones, there were 144 single- 
clone infections and 73 complex infections detected. Of the 
73 complex infections detected, 38 were easily translated in- 
to single-clone infections due to having polymorphisms in 
only one locus, whereas 23 had polymorphisms at two dif- 
ferent loci and 12 had polymorphisms at more than two loci. 
Therefore, potential haplotype mischaracterization (incor- 
rect phasing, "misalignment," of the polymorphisms defining 
each clone) would be limited to 35 infections. Performing 
our analyses (presented below) with and without these 
35 complex infections showed that inferring the phasing 
of these haplotypes did not alter the general results. 

There were 182 unique MS haplotypes detected in the 
302 parasite infections (within 217 individuals). 

Global-Level Genetic Diversity 

PLD, alleles detected, allele counts, and expected heterozy- 
gosity (H e ) were calculated (table 2). Previous studies re- 
ported basic measurements of PLD using eight of the 
same markers (C4M69, PolyA, PF2802, TA42, TA1, 
TA109, ARA2, and PFPK2) in different geographic regions 



(compared in table 3 and fig. 1). Most of the allele size 
ranges at each MS marker observed in this study were 
within the range of those previously reported. The only ex- 
ception was the range of allele sizes detected at the C4M69 
locus, which were approximately 50-65 bp longer than the 
allele sizes reported in Bogreau et al. (2006). 

We noted two distinctions comparing our Peru data 
with the prior studies. First, the Peruvian Amazon popula- 
tion had an elevated mean number of alleles per locus (by 
approximately a factor of 2) when compared with regions 
of similar hypoendemicity in Central and Eastern Brazil, Bo- 
livia, and Columbia (Anderson et al. 2000; Hoffmann et al. 
2003) (see table 3 and fig. 1). The mean number of alleles 
detected in Peru was more comparable with reports from 
the Brazilian Amazon (Machado et al. 2004; Dalla Martha 
et al. 2007) and South east Asia which included sites in Viet- 
nam, Thailand, and Papua New Guinea (PNG) (Anderson 
et al. 2000; Hoffmann et al. 2003) (fig. 1). Secondly, the dif- 
ference between the mean H e and allele count was lower 
than the difference observed in the Brazil, Bolivia, and Viet- 
nam data (Anderson et al. 2000). Generally, a high differ- 
ence between H e and allele count suggests population 



Table 2. Microsatellite Marker Loci and Basic Genetic Diversity Analyses. 



Alleles, Named "a"-"h" in Order of Frequency 



Locus Name 


Chromosome Number 


a 


b 


c 


d 


e 


f 


g 


h H e 


Stepwise Mutation Model' 1 


C1M4 


1 


206 


192 


220 


210 








0.567 


Excess 


B5M5 


3 


194 


176 


171 


209 








0.229 


Deficient 


C4M69 b 


4 


413 


425 


429 


420 








0.388 


Deficient 


POLYA b 


4 


209 


206 


230 


212 








0.352 


Deficient 


PF2802 b 


5 


140 


167 


156 


146 


126 


164 


177 


183 0.585 


Deficient 


TA42 b 


5 


185 


190 


199 


176 








0.205 


Deficient 


TA1 b 


6 


171 


177 


155 


183 








0.598 


Excess 


TA109 b 


6 


160 


163 


153 










0.338 


Deficient 


BM17 


8 


162 


174 


152 


164 








0.606 


Excess 


C9M11 


9 


130 


140 


122 


158 


185 


132 




0.183 


Deficient 


ARA2 b 


11 


60 


72 


75 


54 


63 






0.651 


Excess 


PFPK2 b 


12 


178 


175 


166 


180 


170 


160 




0.617 


Deficient 


C1M67 


13 


232 


253 


190 


211 


220 


200 


263 


0.255 


Deficient 


C13M13 


13 


144 


148 


132 


136 


174 


153 


162 


0.631 


Deficient 



P value testing if there is an excess H e a P = 0.9 

Note. — a H e was not in excess, indicating there was not a recent population constriction (bottleneck) leading to loss of alleles. The 1AM and SMM was tested, the SMM 
computation more suitable for MS markers (Bottleneck v1.2.02) were tested by a Wilcoxon signed-rank test. 
b Microsatellite loci used in previous studies. Table 3 further considers these 8 markers used in previous studies. 
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Table 3. Number of Alleles Detected Globally. 



Number of Alleles Per Locus 



Study Location 


Endemicity 


C4M69 


POLYA 


Pf2802 


TA42 


TA1 


TA109 


ARA2 


PfPK2 


Mean 


References 


Africa 
























Senegal 


Hyper 








c 
J 




Q 






O.D 


1 orloi-r at al (")(\m\ 

Lecicrc ei ai. ^zuuzj 


Djibouti, Senegal 


Hyper 


-7 




Q 
O 


t> 










/- 7 
0./ 
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bottlenecks (constrictions in alleles in prior years). The He 
not being in excess in Peru did not support a bottleneck or 
decrease in effective population size. 

To test if this Peruvian Amazon population had recently 
undergone bottleneck events, we calculated H e for each lo- 
cus and then compared with the allele count observed at 
each locus using the program Bottleneck vl.2.02 (Cornuet 
and Luikart 1996; Garza and Williamson 2001; Schultz et al. 
2009). H e was in significant excess in only four of the 14 
different loci (ARA2, BM17, C1M4, and TA1) (table 2). 
Using a Wilcoxon signed-rank test and mode-shift analysis, 
we found that this does not support a population that 
has recently undergone a bottleneck (P = 0.9, with normal 
L-shaped distribution). 

Characterizing the Population Structure 
To determine the most accurate number of clusters of re- 
lated clones (K) circulating in this population, a range be- 
tween 1 and 10 clusters was tested (see Materials and 
Methods). A K = 5 consistently had the highest clustered- 
ness value over five simulations for each prespecified K 
value, with K = 0.81. A clear increase in clusteredness val- 
ues approaching K = 5 in both directions, maximizing at 
K = 5, indicated a population denned by five different clus- 



ters. We tested for population differentiation between clus- 
ters using the fixation index, F st (Excoffier and Slatkin 1995; 
Excoffier and Lischer 2010). We found that each of the five 
clusters represented independent subpopulations of 
a larger circulating parasite population (P < 0.01, Pairwise 
Fst)- 

For a more exact definition of clones, we defined hap- 
lotype families (hapfams) using a haplotype-to-haplotype 
pairwise comparison method. The 182 different haplotypes 
(found in the 217 infections) could be grouped into 34 hap- 
fams. Many of the hapfams (14 of 34) included between 3 
and 15 of the 302 clones identified in this study. Three of 
these 34 hapfams described the majority of infections: hap- 
fam #14 at 27.2% (n = 82), #8 at 20.5% (n = 62), and #1 1 at 
10.6% (n = 32). Sixteen of the 34 hapfams were detected 
only one or two times. 

We correlated the results of both population structur- 
ing methods (clusters of related clones and hapfams of 
related clones). Of 302 parasite clones, 286 clones 
(94.7%) were concordant between the hapfam and cluster 
groupings. In 26 of the 34 hapfams defined (n = 231 par- 
asite clones of 302, 76.5% of total), all members (100%) 
were identified within one specific cluster (fig. 2). In six 
of the eight instances where all members of a hapfam 
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Fig. 1. Global perspective of the mean number of alleles per locus versus the mean expected heterozygosity. The mean number of alleles per 
locus is represented by the dark gray bars measured on the primary y axis, and the mean expected heterozygosity (HJ is represented by the 
light gray squares measured on the secondary y axis (error bars indicated standard error of the mean). On the x axis are locations of studies that 
have also examined PLD by utilizing microsatellite markers. Some studies are missing microsatellite markers that were reported in other studies 
(including ours), but this is simply an adaptation from what has been previously reported on a global perspective. As well, some studies report 
observed heterozygosity (H„) rather than H e but are just described synonymously. See table 3 for references. 



did not identify to one specific cluster, more than 85% of 
the members did identify with a specific cluster. Only 16 
(5.3%) clones were in hapfams that grouped to more than 
one cluster. 

Each haplotype with its corresponding hapfam charac- 
terization and cluster characterization is shown in figure 2 
using a minimum spanning network (MSN) algorithm. The 
MSN is divided by early years of the study (2003-2005, fig. 
2A) versus the later years of the study (2006-2007, fig. 2B). 
In 2003-2005, there were approximately 2/3 more in- 
fections detected in this population than in 2006-2007 
(table 1). Despite the lower number of infections (and hap- 
lotypes) detected in the later years, the overall area of the 
MSN spanning area was similar. The haplotypes within 
a given hapfam generally had more allele differences 
(shown by the length of the line in the MSN) in the later 
years. Additionally, it is clear that rather than having high- 
frequency clusters, as indicated by the circle size, the clus- 
ters were more expansive and defined a more diverse set of 
haplotypes (and hapfams) in the later years. Overall, the 
agreement between cluster and hapfam was more appar- 
ent in the early years of the study (2003-2005) versus the 
later years of the study (2006-2007), a likely indication of 
temporal diversification (fig. 2A and B). 



Temporal and Spatial Variation of Clusters 
At a level of year of infection and village of detection, we 
considered the cluster characterization to determine the 
overall maintenance of genetic diversity. Despite the inci- 
dence of infections varying between years, with 2005 hav- 
ing nearly doubled the number of infections as years 2004 
and 2006, and five times the number of infections as in 
2007, most clusters were detected in each year within each 
village of this study (fig. 3). There were only two exceptions: 
cluster 5 was not detected in 2003 and cluster 4 was not 
detected in one village. It is possible that the number of 
infections tested per village/year could explain these few 
absences. Overall, we observed that the clusters were main- 
tained in this population both temporally and spatially. 

Expected heterozygosity (H e ) values within each village 
(H e range = 0.40-0.48) and over time (H e range = 0.33- 
0.45, excluding year 2003) remained relatively constant, de- 
spite some fluctuation in the proportion of each cluster 
temporally and spatially. F st was used to determine if there 
was divergence between these parasite subpopulations. Sig- 
nificant divergence was detected between each village (P < 
0.01, pairwise F st ) except PA and NR and also between each 
year (P < 0.01, pairwise F st ) of this study (fig. 3). We tested if 
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Fic. 2. Minimum spanning network of parasites collected in 2003-2005 (A) and 2006-2007 (B). The MS haplotype is shown with its cluster and 
hapfam characterization. The size of the circle reflects the frequency of the given haplotype in the MSN (A and B, separately). Haplotypes with 
different in less than 3 MS markers were defined as the same hapfam. There were 34 hapfams, and each haplotype is indicated by its hapfam 
(numbered 1-34). Colored circles represent clusters (1-5) as classified by Structure v2.2: orange = cluster 1, light blue = cluster 2, purple = 
cluster 3, green = cluster 4, and magenta = cluster 5. The clusters and hapfams generally agreed, particularly in 2003-2005. In 2006-2007, 
despite the lower number of infections, the genetic differentiation between haplotypes tended to increase (overall area similar in A and B; 
length of lines longer in B). 



the number of hapfams detected was a function of the size 
and frequency of the hapfams parasites (haplotypes) over 
time. First, we established whether the hapfams were 
monomorphic or polymorphic based on the original crite- 
ria used to define hapfams, where members of a hapfam 
could have variation in three or fewer different loci. Hap- 
fams were considered monomorphic if there was a single 
haplotype that represented the entire family. A hapfam was 
considered polymorphic if the haplotype sequences within 
the hapfam did have variation. 

The number of hapfams detected for the first time in the 
study (new), detected from one year to the next (rede- 
tected), and failing to be detected again (lost) were deter- 
mined while considering whether the hapfams were 
monomorphic or polymorphic (fig. 4). Totaling the instan- 
ces of hapfams redetection, the monomorphic hapfams 
were redetected in 15 of 33 (45.5%) possible transitions 
from one year to another, while the polymorphic hapfams 
were redetected 32 of 36 (88.9%) possible transitions from 
one year to another. Moreover, the redetection of polymor- 
phic hapfams relative to monomorphic hapfams was sig- 
nificantly greater (P < 0.0001, y 2 ), with the odds of 
polymorphic hapfam redetection being 8.47-fold higher 
than monomorphic hapfams (odds ratio = 8.47). To con- 
trol for size in families, we used an estimator that takes into 
account the fact that a hapfam with more individuals has 
a higher probability of redetection (eqs. 1 and 2). This es- 
timator is used to compute the redetection of families for 
the entire data set and in each category separately (table 4). 
The expectation of survival for the monomorphic hapfams 



is not significantly different from that computed for all 
hapfams (P = 0.53, y 2 ). However, the expected number 
of redetecting polymorphic hapfams is significantly greater 
than expected for all families (P = 0.0229, y 2 ). This result 
suggests that families that are redetected over the years 
have 1-3 differences. 

Maintenance of Subpopulations and Diversification 
We examined variation in haplotype diversity among years 
and villages and found the presence of significant variation 
in villages (P = 0.017) and time (P < 0.0001). This ANOVA 
suggests that more variation is explained by the temporal 
component. To confirm this result, we performed a PCA of 
genetic variation that revealed patterns of genetic structure 
with time (fig. 5A) rather than village (fig. 5B). Assessing the 
significance of PCs using Tracy-Widom distribution is not 
applicable here because of the small number of markers. 
However, the scores for the first (26.47% of extracted vari- 
ance), second (16.04%), and third (13.31%) principal compo- 
nents are significantly different between infections detected 
in years 2003-2005 and in years 2006-2007 (P = 5.9 x 10" 10 
for PC1, P = 7.4 x 10~ 8 for PC2, and P = 0.031 for PC3). In 
contrast, among the different villages, the scores are signif- 
icantly different but only for the first principal component 
(P = 0.032). 

To investigate the possible emergence of new hapfams 
by immigration, we divided infections by village and then 
by year of detection: 1) those detected in years 2003-2005 
only, 2) those detected in 2003-2005 and also 2006-2007, 
and 3) those detected only in 2006-2007 (table 5). An 
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He = 0.40 



He = 0.43 



He = 0.48 



Cluster 1 
Cluster 2 
Cluster 3 
Cluster 4 
Cluster 5 



He = 0.58 



He = 0.44 



He = 0.38 



He = 0.33 



He = 0.45 



He = 0.42 



Fic. 3. {a, b) Bar plot of clustered infections examined by village-year. Each of the five horizontal x axes represent the year that infections were 
detected, whereas the location of the infection by village, in which the infection was detected, intersects in four locations on the y axes. Where 
each village intersects with a corresponding year is referred to as a village-year, indicative of the infections detected in a particular village 
during a particular year. Individual infections are identified by gray tick marks along the corresponding year x axis. Colored bars represent 
clusters (1-5) as classified by Structure v2.2: orange = cluster 1, light blue = cluster 2, purple = cluster 3, green = cluster 4, and magenta = 
cluster 5. The overall H e values are shown to the right for each study-year plotted and individual villages along the bottom most x axis. 
Individual village-year H e values are shown beneath the infections reported during each respective village-year. Incomplete gaps within 
village-years indicate a lack of infections detected during that village-year. 



immigration event from a village outside the community of 
Zungarococha might be observed if specific hapfams were 
detected de novo in villages throughout the 5 years of this 
study. Of the 19 hapfams that were detected in 2006-2007, 
only five hapfams (8 infections) were not previously de- 
tected between 2003 and 2005. Therefore, we can only sug- 
gest that 8 of the 264 infections detected in the later years 
were possibly attributable to immigration from outside of 
the parasites circulating in this population. 

To consider if hapfams detected in the later years 
evolved from parasites detected in the earlier years, LD 
was calculated for each of the three temporal groups above. 
The average percentage of linked loci per locus for those 
hapfams detected only in years 2003-2005, for those de- 
tected in 2003-2005, and also 2006-2007, and for those 
only detected in 2006-2007 was 41.8%, 72.7%, and 3.0%, 
respectively. There was significant decay in LD observed 



in years 2006-2007 versus 2003-2005 (P < 0.0001, Fishers 
Exact). The 5 hapfams only detected in 2006-2007 were 
comprised of only eight infections. These 5 hapfams could 
have arisen in the villages due to recombination between 
existing haplotypes or due to immigration. The percent of 
linked loci in 2003-2005, when most infections were de- 
tected, suggests that the majority of diversity is not intro- 
duced via immigration of different clones from outside 
these villages. 

Discussion 

Longitudinal cohort studies in recent and low-malaria 
transmission regions enable us to study the characteristics 
associated with the maintenance of endemics. In this study, 
we investigated the population structure dynamics of 
P. falciparum in the community of Zungarococha near 
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Fic. 4. (A, B) Detection, redetection, and loss of monomorphic versus polymorphic hapfams over time. Hapfams are represented within green 
boxes at time of entry. The monomorphic hapfams are shown in the top panel (A) and the polymorphic below (B) with the name of the 
hapfam (numbered 1-34). In both monomorphic and polymorphic hapfams, there can be different haplotypes in each hapfam. Hapfams are 
called the same if <3 MS loci makers are different. Therefore, even in monomorphic hapfams, mutations can occur. We report the number of 
individuals in hapfams (n) for years 2003-2006 was used to compute parameters to estimate the redetection probabilities (see table 4). 



Iquitos, Peru over a 5-year period (2003-2007) of sustained 
low transmission (<0.5 infections/person/year) that 
occurred 5 years postepidemic (1994-1998). Using 14 
MS loci scattered throughout the genome of P. falciparum, 
we found markedly high population-level genetic diversity 
when one considers the relatively low transmission history 
of these villages. Among 217 different infections, we found 
182 different MS haplotypes. Using an unsupervised Bayes- 
ian clustering program, called Structure v2.2, we found five 
clusters of related parasites (population substructure). The 
haplotypes could be subgrouped into 34 hapfams, which 



were a fine-level description of the subpopulations but 
were consistent with the population denned by Structure. 
Having 182 haplotypes that could be grouped into 34 hap- 
fams was surprising given the low and recent transmission 
(Mackinnon and Marsh 2010). 

Our first objective was to characterize this genetic diver- 
sity and population structure, in comparison with other 
geographic regions. Prior MS studies of P. falciparum ge- 
netic diversity in other regions of low transmission report 
a low genetic diversity and evidence for parasite popula- 
tions having undergone genetic bottlenecks (fig. 1). If 
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Table 4. Hapfams Parameters and Estimate of Expected Number 
of Redetecting Families Over Time. 
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22.6 



Note. — a Computed without taking observations for year 2007 into account 
because the redetection rates from 2007 to 2008 are unknown. These 
observations were only considered to count the number of families redetected 
from 2006 to 2007. Therefore, the two families that entered in 2007 were not 
considered in the analysis. 



a parasite population was primarily shaped by an epidemic 
or if there were bottlenecks and/or genetic drift, we would 
expect to observe few parasite haplotypes in the popula- 
tion (a clonal or low diversity population). For example, 
Urdaneta et al. (2001) observed little diversity in Venezuela, 
consistent with this type of epidemic or population de- 
mography. The allele count and H e in our Peruvian cohort 
parasite population gives no evidence of such a bottleneck. 
We found the allele count and H e similar to endemic 
regions of the Brazilian Amazon, Southeast Asia, and Oce- 
ania (fig. 1). 

In our Peru cohort, the failure to detect a population 
bottleneck and the accumulation of genetic diversity over 
time would lead us to predict a large effective population 
size. The MSN graphics show this diversity. Approximately, 
2/3 of all the infections detected occurred in years 2003- 
2005 (table 1 and fig. 2A and B) and even with only 1/3 of 
infections observed, the MSN shows much of this genetic 



diversity in years 2006-2007. Our second objective was to 
consider the population over time, redetection of alleles 
over time and space. The MSN graphic (fig. 2A and B) 
showed that in the later years of the study (fig. 2B), 
haplotypes were more diverged with more allelic changes, 
haplotypes, within hapfams. The polymorphic hapfams in 
the population were more likely to be redetected over suc- 
cessive years relative to the monomorphic families (fig. 4). 
We consider that monomorphic families were lost more 
readily due to genetic drift. On the other hand, redetecting 
the polymorphic families with possible 1-3 allele changes, 
is consistent with large effective population size. In agree- 
ment, the PCA and the ANOVA found significant variation 
of genetic diversity by year and village, with more variation 
being explainable by year (fig. 5a and fa). Although immi- 
gration events were possible, they appear unlikely to ex- 
plain all the instances of detecting parasites different by 
1-3 alleles over the years of this study (table 5). Agree- 
ments of haplotype clustering in early versus later years 
of the study (fig. 2A and B) also provide evidence that di- 
versity is higher than expected given the transmission rate 
and that this diversity occurs within the re-detected 
hapfams. 

The maintenance of diversity over time suggests we have 
a large population size. The effective population size (N e ) 
can be estimated by a formula with H e of the allele frequen- 
cies and mutation rates in MS markers (see Iwagami et al. 
2009) or by the conventional methods using nucleotide 
polymorphisms in noncoding regions (Mu et al. 2002). How- 
ever, the N e calculation assumes that the mutation rates and 
transmission rates are constant over time and homogenous 
over space. In this study, we found that the N e is most likely 
impacted by the maintenance and amplification of genetic 
diversity over time. 

There has been one longitudinal MS study published, 
which considered 44 infections that occurred between 




-3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4 

PC 1 PC 1 



Fig. 5. (A, B) Graphic representation of the first two principle components for 206 individual infections genotyped with 14 microsatellites 
markers. Color code shows subgroups of infections partitioned (A) by year of sampling and (B) by villages. The level of genetic variation 
detected is more influenced by the year of collection, than the village of collection. 
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Table 5. Hapfam Persistence in Villages. 
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18 


1 4 


31 32 


NR 


7 


8 


9 


10 


11 


14 






23 


31 34 


PA 


7 


8 


9 


10 




14 


15 


18 




13 


ZC 




8 






11 12 


14 


15 


18 


17 


33 


Total haplotypes (n) 


10 6 


62 


10 


7 


32 7 


82 


15 


8 


4 5 5 3 


3 2 111 



Note. — The 19 hapfams (named by number and shown by numbers in columns in order below) that were detected in 2006-2007 were divided into two groups: those that 
were detected earlier (2003-2005), and those that were not detected earlier in Zungarococha. 



2004 and 2006 in Grenada, Brazil and 11 infections that 
occurred in 2008 in Acre, Brazil (Orjuela-Sanchez et al. 
2009). Using Structure, they found three different popula- 
tions of parasites that appeared, disappeared, and/or reap- 
peared in the population in a manner consistent with 
genetic drift. In contrast, in our longitudinal study, we re- 
detect each of the five main clusters in each of the 5 years. 
The smaller sample size and the distance separating Gre- 
nada and Acre (60 km) may have contributed their detect- 
ing bottlenecks, genetic drift with only certain quite 
different alleles being detected over the successive years. 

Our finding maintenance of hapfams with 1-3 allele dif- 
ferences over time might be reflective of differences in the 
immunoepidemiology within these regions in Brazil versus 
Zungarococha, Peru. We define immunoepidemology as 
the interaction of parasite infection and host susceptibility 
to infection. In Peru, we find asymptomatic infections, 
polymorphisms in antigen encoding genes, and strong hu- 
man host antibody responses (Branch et al. 2005; Torres 
et al. 2008; Sutton et al. 2010). It is plausible that this 
may contribute to the maintenance and/or amplification 
of genetic diversity by impacting which parasites infect 
or are transmitted from host to host. We propose that se- 
quencing antigen encoding genes near the MS markers is 
needed to understand the immunoepidemology on genetic 
diversity in this population. 

Considering the presumed neutral MS markers, there is 
a direct relationship between endemicity, recombination 
rates, and population size (Mackinnon and Marsh 2010). 
However, if there was a nonhomogeneously distributed in- 
fection frequency perhaps due to endemicity and/or 
changing mixed-clone infection prevalence, we could imag- 
ine detecting a large effective population size even in an 
overall average low-transmission region. Recombination 
is obviously limited by the opportunity for outcrossing be- 
tween different clones (Anderson et al. 2000; Durand et al. 
2003). Here, we report that only 33.6% (73/217) of infec- 
tions were mixed-clone infections (table 1). A 33.6% mixed 
infection rate would predict more than 80% linkage of 
genes distanced by more than 1/3 of the length of most 
P. falciparum chromosomes (Conway et al. 1999; Sakihama 
et al. 1999). However, the frequency or types of mixed- 
clone infections could fluctuate or change over time. 
For example, the low LD in the last year suggests recom- 
bination in the last years, which could result in a larger ef- 



fective population size in the last years despite the overall 
decreasing epidemiologic transmission in the last years. 

Conclusions 

We find that P. falciparum emergence and then establish- 
ment of low continued transmission in Iquitos, Peru, is 
characterized by a considerably high genetic diversity 
and maintenance of this diversity temporally and spatially. 
Although malaria researchers have often proposed that 
there is maintenance of genetic diversity in endemic re- 
gions where parasites reinfect the human hosts, this study 
provides the first genome-wide real-time insight into such 
a population structure that might impact our global efforts 
of malaria elimination. 

Supplementary Material 

Supplementary figure 1 is available at Molecular Biology and 
Evolution online (http://www.mbe.oxfordjournals.org/). 
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